Development and Transcription of Assamese Speech Corpus
نویسندگان
چکیده
A balanced speech corpus is the basic need for any speech processing task. In this report we describe our effort on development of Assamese speech corpus. We mainly focused on some issues and challenges faced during development of the corpus. Being a less computationally aware language, this is the first effort to develop speech corpus for Assamese. As corpus development is an ongoing process, in this paper we report only the initial task. Keywords-Speech Corpus; Assamese;Transcription
منابع مشابه
Part of Speech Tagger for Assamese Text
Assamese is a morphologically rich, agglutinative and relatively free word order Indic language. Although spoken by nearly 30 million people, very little computational linguistic work has been done for this language. In this paper, we present our work on part of speech (POS) tagging for Assamese using the well-known Hidden Markov Model. Since no well-defined suitable tagset was available, we de...
متن کاملAssamese Numeral Corpus for Speech Recognition using Cooperative ANN Architecture
Speech corpus is one of the major components in a Speech Processing System where one of the primary requirements is to recognize an input sample. The quality and details captured in speech corpus directly affects the precision of recognition. The current work proposes a platform for speech corpus generation using an adaptive LMS filter and LPC cepstrum, as a part of an ANN based Speech Recognit...
متن کاملA Structured Approach for Building Assamese Corpus: Insights, Applications and Challenges
To study about various naturally occurring phenomenons on natural language text, a well structured text corpus is very much essential. The quality and structure of a corpus can directly influence on performance of various Natural Language Processing applications. Assamese is one of the major Indian languages used by the people of north east India. Language technology development works in Assame...
متن کاملA Study on Detection of Intonation Events of Assamese Speech Required for Tilt Model
This paper has done a study and experimental analysis on different intonation events of Assamese speech. Assamese is a North East Indian language and spoken by lacks of people in India. The researchers need intonation model to identify language specific intonation events, which are necessary for synthesis process of that particular language. The paper shows outcomes of some experiments done wit...
متن کاملDevelopment of Speech corpora for different Speech Recognition tasks in Malayalam language
Speech corpus is the backbone of an Automatic speech Recognition system. This paper presents the development of speech corpora for different speech recognition tasks in Malayalam language. Pronunciation dictionary and Transcription file which are the other two essential resources for building a speech recognizer are also being created. Speech recognition performance of different speech recognit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1309.7312 شماره
صفحات -
تاریخ انتشار 2013